PSCI 3300.003 Political Science Research Methods
University of North Texas
3/9/23
Intro to Multiple Regression Analysis
Review of estimators
Gauss-Markov Assumptions
Drawing “lines of best fit” through data
Lines, Greek, and regression analysis
Counterfactuals, uncertainty, and effect existence
Assumptions and credibility
Regression analysis is the primary workhorse in the social sciences
\[ \definecolor{treat}{RGB}{27,208,213} \definecolor{outcome}{RGB}{98,252,107} \definecolor{baseconf}{RGB}{244,199,58} \definecolor{covariates}{RGB}{178,26,1} \definecolor{index}{RGB}{37,236,167} \definecolor{timeid}{RGB}{244,101,22} \definecolor{mu}{RGB}{71,119,239} \definecolor{sigma}{RGB}{219,58,7} \newcommand{normalcolor}{\color{white}} \newcommand{treat}[1]{\color{treat} #1 \normalcolor} \newcommand{resp}[1]{\color{outcome} #1 \normalcolor} \newcommand{sample}[1]{\color{baseconf} #1 \normalcolor} \newcommand{covar}[1]{\color{covariates} #1 \normalcolor} \newcommand{obs}[1]{\color{index} #1 \normalcolor} \newcommand{tim}[1]{\color{timeid} #1 \normalcolor} \newcommand{mean}[1]{\color{mu} #1 \normalcolor} \newcommand{vari}[1]{\color{sigma} #1 \normalcolor} \]
Estimators have no inherent use to us without properties.
What are the desirable properties of an estimator?
Small Sample Properties
Unbaisedness
Efficiency
Large Sample Properties
If \(\mathrm{E}(\hat{\theta}) = \theta\) and \(\mathrm{E}(\hat{\theta}) - \theta = 0\) our estimator is said to be unbiased.
Conversely, \(\mathrm{E}(\hat{\theta}) - \theta \ne 0\) implies that our estimator is biased.
Unbiasedness is a repeated sampling property that tells us something about the central tendency of the sampling distribution.
Possible sources of bias
Non-random samples
Model misspecification
Endogeneity
An efficient estimator is that which minimizes the amount of unexplained variance such that \(var(\hat{\theta}) < var(\tilde{\theta})\).
If \(\hat{\theta}\) is a linear function of sample data, then \(\hat{\theta}\) is a linear estimator.
Therefore, if \(\hat{\theta}\) is efficient and linear, then in the class of linear estimator, it is the “best unbiased.”
In the case of linear regression, if the assumptions imposed by the Gauss-Markov theorem hold, it is the best linear unbiased estimator.
Some estimators will not satisfy these properties in small samples and require a large sample size for approximately equivalent properties to hold.
These large sample properties are called asymptotic properties and are directly tied to the sample size.
Asymptotic unbiasedness is expressed as \(lim_{n\rightarrow\infty}E(\hat{\theta}_{n}) = \theta\)
Asymptotic estimator for the variance is \[s^{2} = \frac{\Sigma(x_{i}-\bar{x})^2}{n}\]
These estimators are asymptotically unbiased. That is, as \(n\) tends towards \(\infty\), the bias of the estimator tends towards zero.
Consistency is a probabilistic statement \[\lim_{n\rightarrow\infty}Pr(|\hat{\theta} - \theta| < \delta) = 1 \qquad \delta > 0\]
The central limit theorem is an asymptotic theorem.
While unbiasedness can hold for any sample size; consistency is a purely asymptotic property.
The uncertainty principal looms large and most of the time the closest you will ever come to an unbiased estimate of \(\mathrm{E[\resp{Y} ~|~ \treat{X}]}\) is a blurry average
It is impossible to escape the bias variance trade-off
Depending on the inferential goals it may be desirable to accept some bias in exchange for a large reduction in variance
Beware of claims made by people who suggest otherwise
In lots of cases there is no unbiased estimator and it is generally advisable to conduct some form of sensitivity analysis
Remember, stupid assumptions lead to stupid results
The Gauss-Markov theorem states that if a series of assumptions hold, ordinary least squares is the the best linear unbiased estimator.
The dependent variable is quantitative, continuous and unbounded such that \(\resp{y} \in (-\infty, \infty)\)
There is a linear relationship between the dependent variable \(Y\) and each of the predictors \(X_{k} ~ \forall~ k \in \{1,2,\dots, K\}\)
\[\resp{Y} = \beta_{1}X_{1} + \beta_{2}X_{2} + \dots + \beta_{k}X_{k}\]
The error term \(\epsilon\) is statistically independent of each and every predictor \(X_{k}\)
The conditional variance of the error term \(\epsilon\) is zero such that \[Var(\epsilon ~|~ X_{k}) = \sigma^{2} \quad \forall \quad X_{k}\]
The error term \(\epsilon\) is independently distributed and uncorrelated across space and time \[Cov(\epsilon_{i}, \epsilon_{j}) = 0\]
None of the independent variables can be written as perfect linear function of another independent variable.
The outcome variable, response variable, or dependent variable
What we’ve been refering to thus far as \(\resp{Y}\)
The outcome is the thing we are trying to explain or predict
The explanatory variables, predictor variables, or independent variables
What we’ve been refering to thus far as \(\treat{X}\), \(\covar{Z}\), etc.
Explanatory variables are things we use to explain or predict variation in \(\resp{Y}\)
A study that examines the effect of conflict (\(\treat{X}\)) on economic development (\(\resp{Y}\))
Researchers attempt to predict the onset of genocides (\(\resp{Y}\)) by looking at ethnic cleavages (\(\treat{X}\)), revolutions in neighboring countries (\(\covar{Z}\)), and economic growth (\(\covar{W}\))
Netflix uses your past viewing history (\(\treat{X}\)), the day of the week (\(\covar{W}\)), and the time of the day (\(\covar{Z}\)) to guess which show you want to watch next (\(\resp{Y}\))
Remember the distinction between prediction, causal explanation, and description is important
Prediction
Useful if we want to forecast the future
Focus is on predicting future values of \(\resp{Y}\)
Netflix trying to guess your next show or predicting who will enroll in SNAP
Explanation
Here we want to explain effect of \(\resp{Y}\) on \(\treat{X}\)
Focus is on the effect of \(\treat{X}\) on \(\resp{Y}\)
Estimating the effect of conflict on economic growth
Assume \(\treat{X}\) and \(\resp{Y}\) are both theoretically continuous quantities
Draw a line to approximate the relationship between \(\treat{X}\) and \(\resp{Y}\)
And that could plausibly work for data not in the sample
Find the mathy parts of the line and then interpret the math
Let’s look at a simulated example
We assume the following Data Generation Process (DGP) where the fixed parameters \(\beta_{1} = 2.5\) and \(\alpha = 1.5\)
We assume the following Data Generation Process (DGP) where the fixed parameters \(\beta_{1} = 2.5\) and \(\alpha = 1.5\)
\[ \begin{align} \resp{Y}_{\obs{i}} &= \alpha~+~\beta_{1}\cdot \treat{X}_{\obs{i}} + \vari{\epsilon}_{\obs{i}}\\ \mathrm{where}\\ \treat{X} &\sim \mathrm{Uniform}(-3, 3)\\ \vari{\epsilon} &\sim \mathrm{Normal}(0, 2.5)\\ \end{align} \]
We assume the following Data Generation Process (DGP) where the fixed parameters \(\beta_{1} = 2.5\) and \(\alpha = 1.5\)
\[ \begin{align} \resp{Y}_{\obs{i}} &= \alpha~+~\beta_{1}\cdot \treat{X}_{\obs{i}} + \vari{\epsilon}_{\obs{i}}\\ \mathrm{where}\\ \treat{X} &\sim \mathrm{Uniform}(-3, 3)\\ \vari{\epsilon} &\sim \mathrm{Normal}(0, 2.5)\\ \end{align} \]
lm to fit a linear model to the simulated data and broom::tidy to get a quick summary of the resultlm to fit a linear model to the simulated data and broom::tidy to get a quick summary of the resultlm to fit a linear model to the simulated data and broom::tidy to get a quick summary of the resultlm to fit a linear model to the simulated data and broom::tidy to get a quick summary of the result# Fit a linear regression model
ols <- lm(Y ~ X, data = sim_data)
# Get a summary of the results
ols_summary <- broom::tidy(ols, conf.int = TRUE)
# Get the residuals and fitted values
fitted <- broom::augment(ols)
# Print the summary
print(ols_summary)# A tibble: 2 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 2.56 0.807 3.17 2.65e- 3 0.936 4.18
2 X 2.29 0.272 8.42 5.17e-11 1.75 2.84
# Initiate the plot object
ggplot(sim_data, aes(x = X, y = Y)) +
# Add the data points
geom_point(fill = "cyan", shape = 21, size = 4) +
# Labels for the plot
labs(
x = "X",
y = "Y"
) +
# Adjust the x axis scales
scale_x_continuous(breaks = scales::pretty_breaks(n = 8)) +
# Adjust the y axis scales
scale_y_continuous(breaks = scales::pretty_breaks(n = 8)) +
# Plot theme settings
theme_minimal(
base_family = "serif",
base_size = 24
) -> base_plot
# Print the plot
print(base_plot)# Basic linear fit
linear_plot <- base_plot +
geom_smooth(method = lm, color = "white",
se = FALSE, size = 2, lty = 2)
# Semi-parametric spline fit
spline_plot <- base_plot +
geom_smooth(method = lm, color = "white",
formula = y ~ splines::bs(x, 7),
se = FALSE, size = 2)
# Non-parametric loess fit
loess_plot <- base_plot +
geom_smooth(method = "loess", color = "white",
se = FALSE, size = 2)Remember \(y = mx + b\) from high school algebra
We can express a standard linear regression equation as \[\resp{\hat{y}}_{\obs{i}} = \hat{\alpha} + \hat{\beta}_{1} \cdot \treat{x}_{\obs{i}} + \vari{\epsilon}_{\obs{i}}\]
\(\resp{\hat{y}}_{\obs{i}}\) is the expected value of the response for the \(\obs{i^{th}}\) observation
\(\hat{\alpha}\) is the intercept, typically the expected value of \(\resp{y}\) when \(\treat{x} = 0\)
\(\hat{\beta}_{1}\) is the slope coefficient, the average increase in \(\resp{y}\) for each one unit increase in \(\treat{x}\)
\(\vari{\epsilon}_{\obs{i}}\) is a random noise term
The intercept \(\hat{\alpha}\) captures the baseline value
The slope \(\hat{\beta}_{1}\) captures the rate of linear change in \(y\) as \(x\) increases
What happens if we change these values?
The intercept \(\hat{\alpha}\) captures the baseline value
The slope \(\hat{\beta}_{1}\) captures the rate of linear change in \(y\) as \(x\) increases
What happens if we change the values of \(\alpha\) or \(\beta\) in our simulation?
We use \(\beta\) and \(\alpha\) to represent the unknown parameter values for the slope and intercept respectively
We use \(\hat{\beta}\) and \(\hat{\alpha}\) to represent our estimates of these parameters
Regression is a tool for obtaining estimates \(\hat{\beta}\) and \(\hat{\alpha}\) that we hope approximate the true values \(\beta\) and \(\alpha\)
Questions?
Regression with more than one independent variable
Regression adjustment is the primary tool for dealing with confounding, but by no means the only one
When and how the Gauss-Markov theorem fails and what to do about it
Probabilistic formulations of linear regression and an introduction to applied Bayesian inference